{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 用户数据处理\n",
    "（只取训练集和测试集中出现的用户ID）\n",
    "\n",
    "数据来源于Kaggle竞赛：Event Recommendation Engine Challenge，根据\n",
    "events they’ve responded to in the past\n",
    "user demographic information\n",
    "what events they’ve seen and clicked on in our app\n",
    "用户对某个活动是否感兴趣\n",
    "\n",
    "竞赛官网：\n",
    "https://www.kaggle.com/c/event-recommendation-engine-challenge/data\n",
    "\n",
    "用户描述信息在users.csv文件：共7维特征\n",
    "user_id\n",
    "locale：地区，语言\n",
    "birthyear：出身年\n",
    "gender：性别\n",
    "joinedAt：用户加入APP的时间，ISO-8601 UTC time\n",
    "location：地点\n",
    "timezone：时区"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 导入工具包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "\n",
    "import numpy as np\n",
    "import scipy.sparse as ss\n",
    "import scipy.io as sio\n",
    "\n",
    "#保存数据\n",
    "import pickle\n",
    "\n",
    "#event的特征需要编码\n",
    "from PE_utils import FeatureEng\n",
    "from sklearn.preprocessing import normalize\n",
    "#相似度/距离\n",
    "import scipy.spatial.distance as ssd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "总的用户数目超过训练集和测试集中的用户，\n",
    "为节省处理时间和内存，先去处理train和test，得到竞赛需要用到的事件和用户\n",
    "然后对在训练集和测试集中出现过的事件和用户建立新的ID索引\n",
    "先运行user_event.ipynb,\n",
    "得到事件列表文件：PE_userIndex.pkl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "path='../../event_recommendation_engine_challenge_data/'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 读取之前算好的测试集和训练集中出现过的用户"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "number of users in train & test :3391\n"
     ]
    }
   ],
   "source": [
    "#读取训练集和测试集中出现过的用户列表\n",
    "userIndex = pickle.load(open(path+\"PE_userIndex.pkl\", 'rb'))\n",
    "n_users = len(userIndex)\n",
    "\n",
    "print(\"number of users in train & test :%d\" % n_users)"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "# 处理users.csv --> 特征编码、用户之间的相似度"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{b'3929074393': 0,\n",
       " b'2833011810': 1,\n",
       " b'3188496109': 2,\n",
       " b'3804368962': 3,\n",
       " b'3813354209': 4,\n",
       " b'1485684679': 5,\n",
       " b'402625828': 6,\n",
       " b'3064452030': 7,\n",
       " b'4243934665': 8,\n",
       " b'2626409021': 9,\n",
       " b'2424565945': 10,\n",
       " b'1800952806': 11,\n",
       " b'4170453600': 12,\n",
       " b'1665746866': 13,\n",
       " b'3765513583': 14,\n",
       " b'246547106': 15,\n",
       " b'3719575513': 16,\n",
       " b'3599572670': 17,\n",
       " b'3191212256': 18,\n",
       " b'2954540407': 19,\n",
       " b'1210062900': 20,\n",
       " b'2824647204': 21,\n",
       " b'1336647234': 22,\n",
       " b'3913824397': 23,\n",
       " b'292529766': 24,\n",
       " b'1301945337': 25,\n",
       " b'3162849104': 26,\n",
       " b'2251800772': 27,\n",
       " b'1039126403': 28,\n",
       " b'2461816241': 29,\n",
       " b'2401692695': 30,\n",
       " b'3391264132': 31,\n",
       " b'2196432988': 32,\n",
       " b'2701755218': 33,\n",
       " b'2505058685': 34,\n",
       " b'634885878': 35,\n",
       " b'767269269': 36,\n",
       " b'724978354': 37,\n",
       " b'528289771': 38,\n",
       " b'2378242378': 39,\n",
       " b'3797537201': 40,\n",
       " b'1640786388': 41,\n",
       " b'1747042580': 42,\n",
       " b'3514195773': 43,\n",
       " b'929368698': 44,\n",
       " b'2196784941': 45,\n",
       " b'2041677127': 46,\n",
       " b'1637866930': 47,\n",
       " b'3093978794': 48,\n",
       " b'19283444': 49,\n",
       " b'3387377708': 50,\n",
       " b'1484078818': 51,\n",
       " b'2936934993': 52,\n",
       " b'3330527170': 53,\n",
       " b'870439648': 54,\n",
       " b'3378002804': 55,\n",
       " b'3242215738': 56,\n",
       " b'1392003365': 57,\n",
       " b'1995670379': 58,\n",
       " b'1989811285': 59,\n",
       " b'2380308964': 60,\n",
       " b'4116824604': 61,\n",
       " b'4000106370': 62,\n",
       " b'1781308359': 63,\n",
       " b'3891396601': 64,\n",
       " b'2202271093': 65,\n",
       " b'3382101856': 66,\n",
       " b'105347632': 67,\n",
       " b'2878125264': 68,\n",
       " b'2143805175': 69,\n",
       " b'876103013': 70,\n",
       " b'1044598291': 71,\n",
       " b'3705805946': 72,\n",
       " b'1377585598': 73,\n",
       " b'971079958': 74,\n",
       " b'3798379681': 75,\n",
       " b'2326237915': 76,\n",
       " b'2690867178': 77,\n",
       " b'1298546667': 78,\n",
       " b'2937013787': 79,\n",
       " b'1186383301': 80,\n",
       " b'2874765790': 81,\n",
       " b'2069512156': 82,\n",
       " b'2853688818': 83,\n",
       " b'1784981569': 84,\n",
       " b'4190165036': 85,\n",
       " b'2178409601': 86,\n",
       " b'3287283918': 87,\n",
       " b'3526927399': 88,\n",
       " b'2222162186': 89,\n",
       " b'3069569833': 90,\n",
       " b'2141063849': 91,\n",
       " b'3833637800': 92,\n",
       " b'2033574675': 93,\n",
       " b'140576550': 94,\n",
       " b'536673024': 95,\n",
       " b'1684440449': 96,\n",
       " b'895967408': 97,\n",
       " b'1429212077': 98,\n",
       " b'3488552280': 99,\n",
       " b'2788859591': 100,\n",
       " b'2345460195': 101,\n",
       " b'2225154987': 102,\n",
       " b'822628520': 103,\n",
       " b'500590864': 104,\n",
       " b'1476956307': 105,\n",
       " b'1925883781': 106,\n",
       " b'3338007773': 107,\n",
       " b'4157381568': 108,\n",
       " b'4262553869': 109,\n",
       " b'3669515588': 110,\n",
       " b'3943175229': 111,\n",
       " b'699877746': 112,\n",
       " b'3220398573': 113,\n",
       " b'1595465532': 114,\n",
       " b'3500240032': 115,\n",
       " b'952121162': 116,\n",
       " b'2008761267': 117,\n",
       " b'321386359': 118,\n",
       " b'2386480530': 119,\n",
       " b'1539932789': 120,\n",
       " b'1607711586': 121,\n",
       " b'3838037746': 122,\n",
       " b'555727671': 123,\n",
       " b'840945029': 124,\n",
       " b'722675569': 125,\n",
       " b'4217391078': 126,\n",
       " b'4098063101': 127,\n",
       " b'3486057072': 128,\n",
       " b'1082691947': 129,\n",
       " b'2223139042': 130,\n",
       " b'4032639394': 131,\n",
       " b'1279721997': 132,\n",
       " b'2495407859': 133,\n",
       " b'2458068456': 134,\n",
       " b'3867877244': 135,\n",
       " b'4246087593': 136,\n",
       " b'3169024744': 137,\n",
       " b'4214369322': 138,\n",
       " b'2932877456': 139,\n",
       " b'3481754148': 140,\n",
       " b'812299615': 141,\n",
       " b'1072368701': 142,\n",
       " b'1920804065': 143,\n",
       " b'627948360': 144,\n",
       " b'1013893828': 145,\n",
       " b'1699374488': 146,\n",
       " b'4228123262': 147,\n",
       " b'761616937': 148,\n",
       " b'2369345147': 149,\n",
       " b'1292184871': 150,\n",
       " b'943483502': 151,\n",
       " b'29002074': 152,\n",
       " b'3009580143': 153,\n",
       " b'3855705749': 154,\n",
       " b'4092048823': 155,\n",
       " b'1757874805': 156,\n",
       " b'3507630779': 157,\n",
       " b'3173136187': 158,\n",
       " b'3216203623': 159,\n",
       " b'3636724939': 160,\n",
       " b'4239378941': 161,\n",
       " b'67355582': 162,\n",
       " b'3628146836': 163,\n",
       " b'282865908': 164,\n",
       " b'1622015724': 165,\n",
       " b'2630103442': 166,\n",
       " b'2162823560': 167,\n",
       " b'3289417890': 168,\n",
       " b'1611204256': 169,\n",
       " b'3393069828': 170,\n",
       " b'2577093097': 171,\n",
       " b'3633091100': 172,\n",
       " b'427369081': 173,\n",
       " b'3863123259': 174,\n",
       " b'1618153505': 175,\n",
       " b'85640691': 176,\n",
       " b'3062811490': 177,\n",
       " b'3300006570': 178,\n",
       " b'1615717851': 179,\n",
       " b'2287508577': 180,\n",
       " b'2844491503': 181,\n",
       " b'912344156': 182,\n",
       " b'602551909': 183,\n",
       " b'3170052426': 184,\n",
       " b'1805653308': 185,\n",
       " b'1256698929': 186,\n",
       " b'1438134738': 187,\n",
       " b'938766697': 188,\n",
       " b'1565310892': 189,\n",
       " b'3467307298': 190,\n",
       " b'555631167': 191,\n",
       " b'3833491880': 192,\n",
       " b'3037654114': 193,\n",
       " b'3118938050': 194,\n",
       " b'2872854546': 195,\n",
       " b'4134124728': 196,\n",
       " b'3481169628': 197,\n",
       " b'1717882536': 198,\n",
       " b'3071657222': 199,\n",
       " b'2413535648': 200,\n",
       " b'3627636732': 201,\n",
       " b'1785235519': 202,\n",
       " b'1342404851': 203,\n",
       " b'4104928393': 204,\n",
       " b'2197210107': 205,\n",
       " b'3973406086': 206,\n",
       " b'1832146052': 207,\n",
       " b'3642118045': 208,\n",
       " b'3456890538': 209,\n",
       " b'2536109970': 210,\n",
       " b'2671029253': 211,\n",
       " b'344313409': 212,\n",
       " b'670504249': 213,\n",
       " b'2381156995': 214,\n",
       " b'806278943': 215,\n",
       " b'3617436869': 216,\n",
       " b'848363602': 217,\n",
       " b'3772044787': 218,\n",
       " b'1580848851': 219,\n",
       " b'439279583': 220,\n",
       " b'604530209': 221,\n",
       " b'2451466347': 222,\n",
       " b'741804596': 223,\n",
       " b'635462786': 224,\n",
       " b'466698547': 225,\n",
       " b'1897267951': 226,\n",
       " b'2560745948': 227,\n",
       " b'2762021126': 228,\n",
       " b'2845673972': 229,\n",
       " b'1017536864': 230,\n",
       " b'248805577': 231,\n",
       " b'1482478167': 232,\n",
       " b'3422212913': 233,\n",
       " b'3031311007': 234,\n",
       " b'1528359455': 235,\n",
       " b'2069145631': 236,\n",
       " b'3757971767': 237,\n",
       " b'2515029406': 238,\n",
       " b'1906220044': 239,\n",
       " b'2722010150': 240,\n",
       " b'3987610774': 241,\n",
       " b'4063804162': 242,\n",
       " b'3715196259': 243,\n",
       " b'2156920475': 244,\n",
       " b'218125888': 245,\n",
       " b'540360566': 246,\n",
       " b'2296487275': 247,\n",
       " b'1296007357': 248,\n",
       " b'2500150280': 249,\n",
       " b'1317057097': 250,\n",
       " b'2793393083': 251,\n",
       " b'4224531010': 252,\n",
       " b'973677184': 253,\n",
       " b'3956887319': 254,\n",
       " b'393035612': 255,\n",
       " b'1815721382': 256,\n",
       " b'3661711486': 257,\n",
       " b'2971889961': 258,\n",
       " b'2101691514': 259,\n",
       " b'567820539': 260,\n",
       " b'2006663136': 261,\n",
       " b'559905447': 262,\n",
       " b'845573336': 263,\n",
       " b'43965236': 264,\n",
       " b'1377253602': 265,\n",
       " b'4165482530': 266,\n",
       " b'1040473939': 267,\n",
       " b'3866909611': 268,\n",
       " b'366365683': 269,\n",
       " b'1551265824': 270,\n",
       " b'221717938': 271,\n",
       " b'1174924783': 272,\n",
       " b'3399370476': 273,\n",
       " b'3833425001': 274,\n",
       " b'2838808644': 275,\n",
       " b'419456233': 276,\n",
       " b'3261535803': 277,\n",
       " b'1964994682': 278,\n",
       " b'3292982761': 279,\n",
       " b'1544786140': 280,\n",
       " b'4040113414': 281,\n",
       " b'2717402855': 282,\n",
       " b'4044282675': 283,\n",
       " b'3695179498': 284,\n",
       " b'567536219': 285,\n",
       " b'4118821425': 286,\n",
       " b'254874841': 287,\n",
       " b'2205166542': 288,\n",
       " b'3239836164': 289,\n",
       " b'2883315236': 290,\n",
       " b'3466521212': 291,\n",
       " b'3843287866': 292,\n",
       " b'717731951': 293,\n",
       " b'2590902433': 294,\n",
       " b'3284221169': 295,\n",
       " b'4155564936': 296,\n",
       " b'1094788037': 297,\n",
       " b'2964090531': 298,\n",
       " b'2332715028': 299,\n",
       " b'283716061': 300,\n",
       " b'599144899': 301,\n",
       " b'629450757': 302,\n",
       " b'1023714879': 303,\n",
       " b'253594379': 304,\n",
       " b'2247085544': 305,\n",
       " b'3046595544': 306,\n",
       " b'867285031': 307,\n",
       " b'1794241017': 308,\n",
       " b'1269929715': 309,\n",
       " b'2615377525': 310,\n",
       " b'2792856754': 311,\n",
       " b'4005906279': 312,\n",
       " b'1115244043': 313,\n",
       " b'2698716586': 314,\n",
       " b'2827067982': 315,\n",
       " b'2318415276': 316,\n",
       " b'77627621': 317,\n",
       " b'1803816912': 318,\n",
       " b'4159459029': 319,\n",
       " b'3482266552': 320,\n",
       " b'223726744': 321,\n",
       " b'1338460474': 322,\n",
       " b'1607509210': 323,\n",
       " b'3664264982': 324,\n",
       " b'4095465484': 325,\n",
       " b'2394083016': 326,\n",
       " b'980154606': 327,\n",
       " b'1117272184': 328,\n",
       " b'2321085380': 329,\n",
       " b'65116502': 330,\n",
       " b'763694881': 331,\n",
       " b'4181452333': 332,\n",
       " b'3255013048': 333,\n",
       " b'2940390801': 334,\n",
       " b'3655365717': 335,\n",
       " b'4206209185': 336,\n",
       " b'3445686240': 337,\n",
       " b'1594431230': 338,\n",
       " b'763143562': 339,\n",
       " b'4244109605': 340,\n",
       " b'2884249863': 341,\n",
       " b'1109967945': 342,\n",
       " b'831595223': 343,\n",
       " b'2817061625': 344,\n",
       " b'4048868881': 345,\n",
       " b'2456225582': 346,\n",
       " b'1195601618': 347,\n",
       " b'1486339292': 348,\n",
       " b'3138635490': 349,\n",
       " b'96920130': 350,\n",
       " b'180191037': 351,\n",
       " b'2931803878': 352,\n",
       " b'2436389418': 353,\n",
       " b'929841430': 354,\n",
       " b'1860318569': 355,\n",
       " b'1522287705': 356,\n",
       " b'3376295099': 357,\n",
       " b'4100120866': 358,\n",
       " b'1025593498': 359,\n",
       " b'4206915288': 360,\n",
       " b'2768250886': 361,\n",
       " b'2980440229': 362,\n",
       " b'1824088561': 363,\n",
       " b'87299174': 364,\n",
       " b'3909997753': 365,\n",
       " b'1313619105': 366,\n",
       " b'3625917937': 367,\n",
       " b'1515828822': 368,\n",
       " b'1525918364': 369,\n",
       " b'82923911': 370,\n",
       " b'732078356': 371,\n",
       " b'3951036457': 372,\n",
       " b'508837021': 373,\n",
       " b'3798708300': 374,\n",
       " b'2030247174': 375,\n",
       " b'3909374079': 376,\n",
       " b'2288466976': 377,\n",
       " b'309131167': 378,\n",
       " b'3719137321': 379,\n",
       " b'1847477018': 380,\n",
       " b'2028372118': 381,\n",
       " b'3587335204': 382,\n",
       " b'2400147519': 383,\n",
       " b'1707278925': 384,\n",
       " b'261990895': 385,\n",
       " b'3120509854': 386,\n",
       " b'1948945052': 387,\n",
       " b'2856937853': 388,\n",
       " b'2858284447': 389,\n",
       " b'3878851138': 390,\n",
       " b'374464774': 391,\n",
       " b'3415673411': 392,\n",
       " b'3251568850': 393,\n",
       " b'2054648409': 394,\n",
       " b'3787931088': 395,\n",
       " b'393556601': 396,\n",
       " b'2706298619': 397,\n",
       " b'3191592351': 398,\n",
       " b'439507113': 399,\n",
       " b'4181468602': 400,\n",
       " b'1732859563': 401,\n",
       " b'4005100950': 402,\n",
       " b'3272389348': 403,\n",
       " b'469837401': 404,\n",
       " b'776025327': 405,\n",
       " b'442080465': 406,\n",
       " b'546007936': 407,\n",
       " b'3153655575': 408,\n",
       " b'2815174275': 409,\n",
       " b'569826712': 410,\n",
       " b'2265864872': 411,\n",
       " b'653909548': 412,\n",
       " b'2962003608': 413,\n",
       " b'2362718183': 414,\n",
       " b'777634122': 415,\n",
       " b'249577815': 416,\n",
       " b'1208562840': 417,\n",
       " b'4125376571': 418,\n",
       " b'3946163681': 419,\n",
       " b'2712018445': 420,\n",
       " b'3365349866': 421,\n",
       " b'1075375005': 422,\n",
       " b'3904948602': 423,\n",
       " b'1561107146': 424,\n",
       " b'694709965': 425,\n",
       " b'2993544760': 426,\n",
       " b'3519404949': 427,\n",
       " b'4231656286': 428,\n",
       " b'696967670': 429,\n",
       " b'640550522': 430,\n",
       " b'128581247': 431,\n",
       " b'3245449378': 432,\n",
       " b'3810743959': 433,\n",
       " b'1114548338': 434,\n",
       " b'3292004587': 435,\n",
       " b'3746792233': 436,\n",
       " b'3210947692': 437,\n",
       " b'4162769577': 438,\n",
       " b'3221545220': 439,\n",
       " b'2030266367': 440,\n",
       " b'4084719479': 441,\n",
       " b'2326463380': 442,\n",
       " b'1952989987': 443,\n",
       " b'354974889': 444,\n",
       " b'1612038297': 445,\n",
       " b'2043422334': 446,\n",
       " b'2074086523': 447,\n",
       " b'2978622176': 448,\n",
       " b'1050303296': 449,\n",
       " b'2875507009': 450,\n",
       " b'2911203191': 451,\n",
       " b'2360121322': 452,\n",
       " b'2600683125': 453,\n",
       " b'1166204486': 454,\n",
       " b'393817534': 455,\n",
       " b'3724535659': 456,\n",
       " b'2183586456': 457,\n",
       " b'4205646791': 458,\n",
       " b'3987684633': 459,\n",
       " b'2010045207': 460,\n",
       " b'3156650528': 461,\n",
       " b'306153925': 462,\n",
       " b'210296799': 463,\n",
       " b'2177461994': 464,\n",
       " b'3870800241': 465,\n",
       " b'3418659271': 466,\n",
       " b'3182595870': 467,\n",
       " b'2304075585': 468,\n",
       " b'806246894': 469,\n",
       " b'4026656138': 470,\n",
       " b'3434508639': 471,\n",
       " b'648931301': 472,\n",
       " b'146035325': 473,\n",
       " b'1127657981': 474,\n",
       " b'2940917694': 475,\n",
       " b'2098666939': 476,\n",
       " b'1049481453': 477,\n",
       " b'1177603447': 478,\n",
       " b'2272113940': 479,\n",
       " b'2815688436': 480,\n",
       " b'4099439378': 481,\n",
       " b'80379054': 482,\n",
       " b'3308173668': 483,\n",
       " b'2902208687': 484,\n",
       " b'97062150': 485,\n",
       " b'3954972720': 486,\n",
       " b'4200601044': 487,\n",
       " b'2356937947': 488,\n",
       " b'2160680517': 489,\n",
       " b'1743103684': 490,\n",
       " b'3523009435': 491,\n",
       " b'384932005': 492,\n",
       " b'3677612174': 493,\n",
       " b'2803741841': 494,\n",
       " b'2967464230': 495,\n",
       " b'2914443596': 496,\n",
       " b'3635047946': 497,\n",
       " b'3147653776': 498,\n",
       " b'1627419158': 499,\n",
       " b'2495110802': 500,\n",
       " b'2706413631': 501,\n",
       " b'1914182220': 502,\n",
       " b'3555067038': 503,\n",
       " b'91139927': 504,\n",
       " b'2928822295': 505,\n",
       " b'778295633': 506,\n",
       " b'1586029420': 507,\n",
       " b'384269001': 508,\n",
       " b'1793252788': 509,\n",
       " b'1302145719': 510,\n",
       " b'2187324082': 511,\n",
       " b'1101021964': 512,\n",
       " b'2936324533': 513,\n",
       " b'37106093': 514,\n",
       " b'3026296984': 515,\n",
       " b'666534021': 516,\n",
       " b'3600455391': 517,\n",
       " b'1062984942': 518,\n",
       " b'3421486832': 519,\n",
       " b'2819638620': 520,\n",
       " b'1696361679': 521,\n",
       " b'607394331': 522,\n",
       " b'852034251': 523,\n",
       " b'2441936878': 524,\n",
       " b'1047741032': 525,\n",
       " b'50457238': 526,\n",
       " b'3041705231': 527,\n",
       " b'141849067': 528,\n",
       " b'1515805774': 529,\n",
       " b'3608734461': 530,\n",
       " b'1073821707': 531,\n",
       " b'3627571763': 532,\n",
       " b'3440770225': 533,\n",
       " b'1943715600': 534,\n",
       " b'2328481418': 535,\n",
       " b'326930040': 536,\n",
       " b'1345896548': 537,\n",
       " b'1068430193': 538,\n",
       " b'1169270868': 539,\n",
       " b'4045020395': 540,\n",
       " b'296126900': 541,\n",
       " b'1805025807': 542,\n",
       " b'1567970024': 543,\n",
       " b'3253486500': 544,\n",
       " b'609331956': 545,\n",
       " b'3647778578': 546,\n",
       " b'2489943153': 547,\n",
       " b'1868178875': 548,\n",
       " b'184140061': 549,\n",
       " b'2910006718': 550,\n",
       " b'1072640827': 551,\n",
       " b'93148987': 552,\n",
       " b'1568770334': 553,\n",
       " b'2700358387': 554,\n",
       " b'3298977709': 555,\n",
       " b'2587246668': 556,\n",
       " b'2537288678': 557,\n",
       " b'1287850161': 558,\n",
       " b'2557219570': 559,\n",
       " b'322208106': 560,\n",
       " b'4072712031': 561,\n",
       " b'3006433212': 562,\n",
       " b'2194245277': 563,\n",
       " b'451317840': 564,\n",
       " b'3594879390': 565,\n",
       " b'770336801': 566,\n",
       " b'683588589': 567,\n",
       " b'782865887': 568,\n",
       " b'3350047181': 569,\n",
       " b'3253822509': 570,\n",
       " b'1561371437': 571,\n",
       " b'3761533297': 572,\n",
       " b'1842145373': 573,\n",
       " b'990507785': 574,\n",
       " b'3861090052': 575,\n",
       " b'2660754061': 576,\n",
       " b'2272634729': 577,\n",
       " b'736262792': 578,\n",
       " b'166773493': 579,\n",
       " b'2861036936': 580,\n",
       " b'2298623317': 581,\n",
       " b'681010476': 582,\n",
       " b'2193694254': 583,\n",
       " b'3047819005': 584,\n",
       " b'2556133275': 585,\n",
       " b'655721325': 586,\n",
       " b'618158373': 587,\n",
       " b'605378672': 588,\n",
       " b'3883721026': 589,\n",
       " b'554693050': 590,\n",
       " b'2716313754': 591,\n",
       " b'516841937': 592,\n",
       " b'4162525077': 593,\n",
       " b'3657335672': 594,\n",
       " b'1748492203': 595,\n",
       " b'24978365': 596,\n",
       " b'2766172404': 597,\n",
       " b'3167658465': 598,\n",
       " b'3027707150': 599,\n",
       " b'2648736219': 600,\n",
       " b'3282509573': 601,\n",
       " b'3822987687': 602,\n",
       " b'2764881224': 603,\n",
       " b'1945364081': 604,\n",
       " b'968336394': 605,\n",
       " b'1150855279': 606,\n",
       " b'3282478477': 607,\n",
       " b'3804509987': 608,\n",
       " b'156584666': 609,\n",
       " b'544449933': 610,\n",
       " b'620032149': 611,\n",
       " b'2869467989': 612,\n",
       " b'2959872768': 613,\n",
       " b'3837379530': 614,\n",
       " b'2946269473': 615,\n",
       " b'4268611130': 616,\n",
       " b'3270633490': 617,\n",
       " b'111840914': 618,\n",
       " b'2614389627': 619,\n",
       " b'4196666928': 620,\n",
       " b'2726693879': 621,\n",
       " b'2106955961': 622,\n",
       " b'1075126714': 623,\n",
       " b'458848274': 624,\n",
       " b'1432546106': 625,\n",
       " b'1524520597': 626,\n",
       " b'3815746762': 627,\n",
       " b'3205024967': 628,\n",
       " b'3021174039': 629,\n",
       " b'1240674478': 630,\n",
       " b'1805968919': 631,\n",
       " b'2242443745': 632,\n",
       " b'915098649': 633,\n",
       " b'3123712598': 634,\n",
       " b'487468823': 635,\n",
       " b'1759431765': 636,\n",
       " b'2675088061': 637,\n",
       " b'887085722': 638,\n",
       " b'256035140': 639,\n",
       " b'1567499133': 640,\n",
       " b'2541238114': 641,\n",
       " b'2127416756': 642,\n",
       " b'4164388326': 643,\n",
       " b'1959500556': 644,\n",
       " b'3781956022': 645,\n",
       " b'1849546291': 646,\n",
       " b'1171363794': 647,\n",
       " b'3444786385': 648,\n",
       " b'964607716': 649,\n",
       " b'282487230': 650,\n",
       " b'2070073073': 651,\n",
       " b'3887899389': 652,\n",
       " b'2524494796': 653,\n",
       " b'2489551967': 654,\n",
       " b'199379035': 655,\n",
       " b'1933513142': 656,\n",
       " b'4087630383': 657,\n",
       " b'1030441254': 658,\n",
       " b'3734794530': 659,\n",
       " b'216180088': 660,\n",
       " b'2062977016': 661,\n",
       " b'1353596523': 662,\n",
       " b'2956786715': 663,\n",
       " b'1006838695': 664,\n",
       " b'3669048335': 665,\n",
       " b'900914043': 666,\n",
       " b'3791989238': 667,\n",
       " b'532014578': 668,\n",
       " b'1049146719': 669,\n",
       " b'15390083': 670,\n",
       " b'883614923': 671,\n",
       " b'398134674': 672,\n",
       " b'193920233': 673,\n",
       " b'3953805126': 674,\n",
       " b'2941044791': 675,\n",
       " b'1119929505': 676,\n",
       " b'1148170980': 677,\n",
       " b'306422496': 678,\n",
       " b'3244067756': 679,\n",
       " b'2378571713': 680,\n",
       " b'1463808662': 681,\n",
       " b'611791560': 682,\n",
       " b'2420019637': 683,\n",
       " b'2789613386': 684,\n",
       " b'1379130419': 685,\n",
       " b'639409951': 686,\n",
       " b'3348106292': 687,\n",
       " b'3368369538': 688,\n",
       " b'1174496191': 689,\n",
       " b'4227630018': 690,\n",
       " b'3603006634': 691,\n",
       " b'1520557068': 692,\n",
       " b'3969691423': 693,\n",
       " b'1784363573': 694,\n",
       " b'1918945522': 695,\n",
       " b'3646961870': 696,\n",
       " b'1500569811': 697,\n",
       " b'2271075773': 698,\n",
       " b'454881517': 699,\n",
       " b'3408163802': 700,\n",
       " b'3156258711': 701,\n",
       " b'2480816831': 702,\n",
       " b'364700925': 703,\n",
       " b'3771702031': 704,\n",
       " b'2460812912': 705,\n",
       " b'3627421819': 706,\n",
       " b'906981127': 707,\n",
       " b'2962659545': 708,\n",
       " b'3751635372': 709,\n",
       " b'3527940337': 710,\n",
       " b'1387635875': 711,\n",
       " b'2932100756': 712,\n",
       " b'252627634': 713,\n",
       " b'3199685636': 714,\n",
       " b'469116393': 715,\n",
       " b'3962251454': 716,\n",
       " b'2275600529': 717,\n",
       " b'1426944148': 718,\n",
       " b'703001700': 719,\n",
       " b'240047100': 720,\n",
       " b'481236042': 721,\n",
       " b'394596900': 722,\n",
       " b'3897720034': 723,\n",
       " b'2680339394': 724,\n",
       " b'233871977': 725,\n",
       " b'3902408840': 726,\n",
       " b'1577437083': 727,\n",
       " b'4236494': 728,\n",
       " b'4176372154': 729,\n",
       " b'3308369328': 730,\n",
       " b'431015679': 731,\n",
       " b'383587770': 732,\n",
       " b'3531604778': 733,\n",
       " b'3031242464': 734,\n",
       " b'4222590338': 735,\n",
       " b'3811264923': 736,\n",
       " b'577206982': 737,\n",
       " b'637288269': 738,\n",
       " b'4261340176': 739,\n",
       " b'895953294': 740,\n",
       " b'443655774': 741,\n",
       " b'180779799': 742,\n",
       " b'1981701814': 743,\n",
       " b'3593287342': 744,\n",
       " b'439366650': 745,\n",
       " b'504326263': 746,\n",
       " b'981252249': 747,\n",
       " b'2467409062': 748,\n",
       " b'1890732196': 749,\n",
       " b'528218683': 750,\n",
       " b'3663573906': 751,\n",
       " b'3376988073': 752,\n",
       " b'3685921102': 753,\n",
       " b'1573621317': 754,\n",
       " b'3709640639': 755,\n",
       " b'3043278753': 756,\n",
       " b'168520465': 757,\n",
       " b'1196973493': 758,\n",
       " b'3679105159': 759,\n",
       " b'88554519': 760,\n",
       " b'1623287180': 761,\n",
       " b'2616577596': 762,\n",
       " b'1349230619': 763,\n",
       " b'3141188020': 764,\n",
       " b'1882928667': 765,\n",
       " b'1598750229': 766,\n",
       " b'2813753172': 767,\n",
       " b'2811661166': 768,\n",
       " b'2728047278': 769,\n",
       " b'3453760477': 770,\n",
       " b'1959697351': 771,\n",
       " b'2451582701': 772,\n",
       " b'3866177358': 773,\n",
       " b'1278116367': 774,\n",
       " b'73288922': 775,\n",
       " b'20018153': 776,\n",
       " b'2330978075': 777,\n",
       " b'2624413719': 778,\n",
       " b'3511417249': 779,\n",
       " b'2249748129': 780,\n",
       " b'1865714809': 781,\n",
       " b'3492585943': 782,\n",
       " b'567534910': 783,\n",
       " b'4106525792': 784,\n",
       " b'4247553940': 785,\n",
       " b'2217567238': 786,\n",
       " b'569536605': 787,\n",
       " b'771577243': 788,\n",
       " b'571415268': 789,\n",
       " b'3257773536': 790,\n",
       " b'661151794': 791,\n",
       " b'2943972367': 792,\n",
       " b'1270740692': 793,\n",
       " b'3968152684': 794,\n",
       " b'2902724458': 795,\n",
       " b'524529500': 796,\n",
       " b'2860912043': 797,\n",
       " b'3282950773': 798,\n",
       " b'386145959': 799,\n",
       " b'2411451936': 800,\n",
       " b'749863583': 801,\n",
       " b'1815238419': 802,\n",
       " b'4165509784': 803,\n",
       " b'1792619547': 804,\n",
       " b'3223915354': 805,\n",
       " b'2672443109': 806,\n",
       " b'2893327845': 807,\n",
       " b'614599699': 808,\n",
       " b'645414373': 809,\n",
       " b'691612308': 810,\n",
       " b'3709066919': 811,\n",
       " b'35565661': 812,\n",
       " b'3156280563': 813,\n",
       " b'522776668': 814,\n",
       " b'1890318514': 815,\n",
       " b'167200370': 816,\n",
       " b'331502829': 817,\n",
       " b'3841001673': 818,\n",
       " b'433510318': 819,\n",
       " b'2799306064': 820,\n",
       " b'2162567305': 821,\n",
       " b'43826519': 822,\n",
       " b'2863925078': 823,\n",
       " b'1020875748': 824,\n",
       " b'2499645933': 825,\n",
       " b'3433923309': 826,\n",
       " b'1275857085': 827,\n",
       " b'2766269208': 828,\n",
       " b'4012638060': 829,\n",
       " b'1455080965': 830,\n",
       " b'3582004219': 831,\n",
       " b'1320099877': 832,\n",
       " b'2444012189': 833,\n",
       " b'2133006016': 834,\n",
       " b'3591151863': 835,\n",
       " b'2488685741': 836,\n",
       " b'2147677924': 837,\n",
       " b'2419536569': 838,\n",
       " b'455279111': 839,\n",
       " b'4011347584': 840,\n",
       " b'2664817652': 841,\n",
       " b'4128216168': 842,\n",
       " b'1018886228': 843,\n",
       " b'1733597545': 844,\n",
       " b'913235409': 845,\n",
       " b'2377207090': 846,\n",
       " b'168764910': 847,\n",
       " b'1120323401': 848,\n",
       " b'630652349': 849,\n",
       " b'1833128849': 850,\n",
       " b'661907327': 851,\n",
       " b'781002123': 852,\n",
       " b'396008921': 853,\n",
       " b'210626346': 854,\n",
       " b'4075574010': 855,\n",
       " b'4039039716': 856,\n",
       " b'1264179509': 857,\n",
       " b'320069736': 858,\n",
       " b'521893181': 859,\n",
       " b'113021847': 860,\n",
       " b'2187183134': 861,\n",
       " b'1066906993': 862,\n",
       " b'2927241033': 863,\n",
       " b'1847581168': 864,\n",
       " b'296271448': 865,\n",
       " b'3390146258': 866,\n",
       " b'2532791342': 867,\n",
       " b'2413089165': 868,\n",
       " b'2694023881': 869,\n",
       " b'3697799491': 870,\n",
       " b'2661941135': 871,\n",
       " b'227195072': 872,\n",
       " b'3430986998': 873,\n",
       " b'2998686163': 874,\n",
       " b'3550832930': 875,\n",
       " b'552316934': 876,\n",
       " b'2116294822': 877,\n",
       " b'1697658257': 878,\n",
       " b'762911677': 879,\n",
       " b'2469046391': 880,\n",
       " b'3560210941': 881,\n",
       " b'2683024755': 882,\n",
       " b'1680807465': 883,\n",
       " b'4202081674': 884,\n",
       " b'3505511916': 885,\n",
       " b'1176550521': 886,\n",
       " b'457456094': 887,\n",
       " b'2622502489': 888,\n",
       " b'1048933817': 889,\n",
       " b'576776993': 890,\n",
       " b'1583577411': 891,\n",
       " b'639318962': 892,\n",
       " b'2706381153': 893,\n",
       " b'242737714': 894,\n",
       " b'3537668660': 895,\n",
       " b'3614812788': 896,\n",
       " b'2848778083': 897,\n",
       " b'2393678699': 898,\n",
       " b'1078537007': 899,\n",
       " b'4122118697': 900,\n",
       " b'3383460510': 901,\n",
       " b'610133863': 902,\n",
       " b'2818464367': 903,\n",
       " b'3579660533': 904,\n",
       " b'589091561': 905,\n",
       " b'3080858756': 906,\n",
       " b'1162467941': 907,\n",
       " b'4078937413': 908,\n",
       " b'3825327054': 909,\n",
       " b'2338481531': 910,\n",
       " b'247769793': 911,\n",
       " b'3460014666': 912,\n",
       " b'881544435': 913,\n",
       " b'4238211626': 914,\n",
       " b'2329661605': 915,\n",
       " b'1038432983': 916,\n",
       " b'620761021': 917,\n",
       " b'2325340243': 918,\n",
       " b'491840802': 919,\n",
       " b'279699155': 920,\n",
       " b'988160405': 921,\n",
       " b'4284267710': 922,\n",
       " b'689961313': 923,\n",
       " b'3467219840': 924,\n",
       " b'4051807703': 925,\n",
       " b'2623537385': 926,\n",
       " b'842198355': 927,\n",
       " b'1066372954': 928,\n",
       " b'1134784132': 929,\n",
       " b'2256989025': 930,\n",
       " b'3186268838': 931,\n",
       " b'4005594979': 932,\n",
       " b'1093568005': 933,\n",
       " b'3029988578': 934,\n",
       " b'1931485787': 935,\n",
       " b'1201984883': 936,\n",
       " b'1855764195': 937,\n",
       " b'2791232352': 938,\n",
       " b'3691949299': 939,\n",
       " b'1386311924': 940,\n",
       " b'4211609391': 941,\n",
       " b'659447721': 942,\n",
       " b'580008340': 943,\n",
       " b'1615303306': 944,\n",
       " b'2920617215': 945,\n",
       " b'1342961341': 946,\n",
       " b'2286248913': 947,\n",
       " b'1611246540': 948,\n",
       " b'2238882361': 949,\n",
       " b'2501280523': 950,\n",
       " b'1969578305': 951,\n",
       " b'3492312016': 952,\n",
       " b'1610329121': 953,\n",
       " b'1085490219': 954,\n",
       " b'1304321680': 955,\n",
       " b'1353265512': 956,\n",
       " b'1531952242': 957,\n",
       " b'4159195102': 958,\n",
       " b'15469418': 959,\n",
       " b'3404441712': 960,\n",
       " b'1638568594': 961,\n",
       " b'547302435': 962,\n",
       " b'5574997': 963,\n",
       " b'3692641931': 964,\n",
       " b'1075504486': 965,\n",
       " b'1838926600': 966,\n",
       " b'3525941406': 967,\n",
       " b'1507678209': 968,\n",
       " b'3839562998': 969,\n",
       " b'793834125': 970,\n",
       " b'95170590': 971,\n",
       " b'2517213950': 972,\n",
       " b'2282422131': 973,\n",
       " b'3647725503': 974,\n",
       " b'1287353407': 975,\n",
       " b'3436715440': 976,\n",
       " b'1114094430': 977,\n",
       " b'2271406145': 978,\n",
       " b'2340199364': 979,\n",
       " b'2018840392': 980,\n",
       " b'3793812304': 981,\n",
       " b'3081802439': 982,\n",
       " b'307071731': 983,\n",
       " b'3725766398': 984,\n",
       " b'1220061846': 985,\n",
       " b'2294227413': 986,\n",
       " b'2343731867': 987,\n",
       " b'1666288434': 988,\n",
       " b'430443066': 989,\n",
       " b'713380351': 990,\n",
       " b'2630238456': 991,\n",
       " b'2068379568': 992,\n",
       " b'3137272703': 993,\n",
       " b'3269497309': 994,\n",
       " b'350964414': 995,\n",
       " b'2193664477': 996,\n",
       " b'1654935837': 997,\n",
       " b'351713549': 998,\n",
       " b'2995967526': 999,\n",
       " ...}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "userIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b'3166414361' in userIndex"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style>\n",
       "    .dataframe thead tr:only-child th {\n",
       "        text-align: right;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>user_id</th>\n",
       "      <th>locale</th>\n",
       "      <th>birthyear</th>\n",
       "      <th>gender</th>\n",
       "      <th>joinedAt</th>\n",
       "      <th>location</th>\n",
       "      <th>timezone</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>3197468391</td>\n",
       "      <td>id_ID</td>\n",
       "      <td>1993</td>\n",
       "      <td>male</td>\n",
       "      <td>2012-10-02T06:40:55.524Z</td>\n",
       "      <td>Medan  Indonesia</td>\n",
       "      <td>480.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>3537982273</td>\n",
       "      <td>id_ID</td>\n",
       "      <td>1992</td>\n",
       "      <td>male</td>\n",
       "      <td>2012-09-29T18:03:12.111Z</td>\n",
       "      <td>Medan  Indonesia</td>\n",
       "      <td>420.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>823183725</td>\n",
       "      <td>en_US</td>\n",
       "      <td>1975</td>\n",
       "      <td>male</td>\n",
       "      <td>2012-10-06T03:14:07.149Z</td>\n",
       "      <td>Stratford  Ontario</td>\n",
       "      <td>-240.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1872223848</td>\n",
       "      <td>en_US</td>\n",
       "      <td>1991</td>\n",
       "      <td>female</td>\n",
       "      <td>2012-11-04T08:59:43.783Z</td>\n",
       "      <td>Tehran  Iran</td>\n",
       "      <td>210.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>3429017717</td>\n",
       "      <td>id_ID</td>\n",
       "      <td>1995</td>\n",
       "      <td>female</td>\n",
       "      <td>2012-09-10T16:06:53.132Z</td>\n",
       "      <td>NaN</td>\n",
       "      <td>420.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      user_id locale birthyear  gender                  joinedAt  \\\n",
       "0  3197468391  id_ID      1993    male  2012-10-02T06:40:55.524Z   \n",
       "1  3537982273  id_ID      1992    male  2012-09-29T18:03:12.111Z   \n",
       "2   823183725  en_US      1975    male  2012-10-06T03:14:07.149Z   \n",
       "3  1872223848  en_US      1991  female  2012-11-04T08:59:43.783Z   \n",
       "4  3429017717  id_ID      1995  female  2012-09-10T16:06:53.132Z   \n",
       "\n",
       "             location  timezone  \n",
       "0    Medan  Indonesia     480.0  \n",
       "1    Medan  Indonesia     420.0  \n",
       "2  Stratford  Ontario    -240.0  \n",
       "3        Tehran  Iran     210.0  \n",
       "4                 NaN     420.0  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "#读取数据\n",
    "users = pd.read_csv(path+\"users.csv\")\n",
    "users.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 38209 entries, 0 to 38208\n",
      "Data columns (total 7 columns):\n",
      "user_id      38209 non-null int64\n",
      "locale       38209 non-null object\n",
      "birthyear    38209 non-null object\n",
      "gender       38100 non-null object\n",
      "joinedAt     38152 non-null object\n",
      "location     32745 non-null object\n",
      "timezone     37773 non-null float64\n",
      "dtypes: float64(1), int64(1), object(5)\n",
      "memory usage: 2.0+ MB\n"
     ]
    }
   ],
   "source": [
    "users.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "FE = FeatureEng()\n",
    "\n",
    "#locale\tbirthyear\tgender\tjoinedAt\tlocation\ttimezone\n",
    "#去掉user_id列\n",
    "n_cols = users.shape[1] - 1\n",
    "cols = ['LocaleId', 'BirthYearInt', 'GenderId', 'JoinedYearMonth', 'CountryId', 'TimezoneInt']\n",
    "\n",
    "#users编码后的特征\n",
    "#userMatrix = np.zeros((n_users, n_cols), dtype=np.int)\n",
    "userMatrix = ss.dok_matrix((n_users, n_cols))\n",
    "\n",
    "for u in range(users.shape[0]): \n",
    "    userId = str(users.loc[u,'user_id'])\n",
    "    userId=bytes(userId, 'utf-8') \n",
    "    if userId in userIndex:  #在训练集或测试集中出现\n",
    "        i = userIndex[userId]\n",
    "        userMatrix[i, 0] = FE.getLocaleId(users.loc[u,'locale'])\n",
    "        userMatrix[i, 1] = FE.getBirthYearInt(users.loc[u,'birthyear'])\n",
    "        userMatrix[i, 2] = FE.getGenderId(users.loc[u,'gender'])\n",
    "        userMatrix[i, 3] = FE.getJoinedYearMonth(users.loc[u,'joinedAt'])\n",
    "        \n",
    "        #由于地点的写法不规范，该编码似乎不起作用（所有样本的特征都被编码成0了）\n",
    "        userMatrix[i, 4] = FE.getCountryId(users.loc[u,'location'])\n",
    "        \n",
    "        userMatrix[i, 5] = FE.getTimezoneInt(users.loc[u,'timezone'])\n",
    "\n",
    "# 归一化用户矩阵\n",
    "userMatrix = normalize(userMatrix, norm=\"l2\", axis=0, copy=False)\n",
    "sio.mmwrite(path+\"US_userMatrix\", userMatrix)\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 计算用户相似度矩阵，之后用户推荐系统\n",
    "userSimMatrix = ss.dok_matrix((n_users, n_users))\n",
    "\n",
    "#读取在测试集和训练集中出现的用户对\n",
    "uniqueUserPairs = pickle.load(open(path+\"FE_uniqueUserPairs.pkl\", 'rb'))\n",
    "\n",
    "#对角线元素\n",
    "for i in range(0, n_users):\n",
    "    userSimMatrix[i, i] = 1.0\n",
    "    \n",
    "#对称\n",
    "for u1, u2 in uniqueUserPairs:\n",
    "    #i = userIndex[u1]\n",
    "    #j = userIndex[u2]\n",
    "    i = u1\n",
    "    j = u2\n",
    "    if  (i, j) not in userSimMatrix:\n",
    "        #Person相关系数做为相似度度量\n",
    "        #特征：国家（locale、location）、年龄、性别、时区、地点\n",
    "        #usim = ssd.correlation(userMatrix[i,:],\n",
    "            #userMatrix[j,:])\n",
    "    \n",
    "        usim = ssd.correlation(userMatrix.getrow(i).todense(),\n",
    "          userMatrix.getrow(j).todense())\n",
    "        userSimMatrix[i, j] = usim\n",
    "        userSimMatrix[j, i] = usim\n",
    "    \n",
    "sio.mmwrite(path+\"US_userSimMatrix\", userSimMatrix)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "matrix([[1., 0., 0., ..., 0., 0., 0.]])"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "userSimMatrix.getrow(0).todense()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5.0475417160411595e-06"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "usim = ssd.correlation(userMatrix.getrow(2806).todense(),\n",
    "          userMatrix.getrow(2814).todense())\n",
    "usim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5.0475417160411595e-06"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "userSimMatrix[2806, 2814]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
